readxl package
List the sheets of an Excel file
Before you can start importing from Excel, you should find out which sheets are available in the workbook. You can use the excel_sheets() function for this.
You will find the Excel file urbanpop.xlsx in your working directory (type dir() to see it). This dataset contains urban population metrics for practically all countries in the world throughout time (Source: Gapminder). It contains three sheets for three different time periods. In each sheet, the first row contains the column names.
# Load the readxl package
library(readxl)
# Print the names of all worksheets
excel_sheets("../xDatasets/urbanpop.xlsx")## [1] "1960-1966" "1967-1974" "1975-2011"
As you can see, the result of excel_sheets() is simply a character vector; you haven’t imported anything yet. That’s something for the read_excel() function.
Import an Excel sheet
Now that you know the names of the sheets in the Excel file you want to import, it is time to import those sheets into R. You can do this with the read_excel() function. Have a look at this recipe:
data <- read_excel("data.xlsx", sheet = "my_sheet")
This call simply imports the sheet with the name "my_sheet" from the "data.xlsx" file. You can also pass a number to the sheet argument; this will cause read_excel() to import the sheet with the given sheet number. sheet = 1 will import the first sheet, sheet = 2 will import the second sheet, and so on.
# Read the sheets, one by one
pop_1 <- read_excel("../xDatasets/urbanpop.xlsx", sheet = 1)
pop_2 <- read_excel("../xDatasets/urbanpop.xlsx", sheet = 2)
pop_3 <- read_excel("../xDatasets/urbanpop.xlsx", sheet = 3)
# Put pop_1, pop_2 and pop_3 in a list: pop_list
pop_list = list(pop_1, pop_2, pop_3)
# Display the structure of pop_list
str(pop_list)## List of 3
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 209 obs. of 8 variables:
## ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## ..$ 1960 : num [1:209] 769308 494443 3293999 NA NA ...
## ..$ 1961 : num [1:209] 814923 511803 3515148 13660 8724 ...
## ..$ 1962 : num [1:209] 858522 529439 3739963 14166 9700 ...
## ..$ 1963 : num [1:209] 903914 547377 3973289 14759 10748 ...
## ..$ 1964 : num [1:209] 951226 565572 4220987 15396 11866 ...
## ..$ 1965 : num [1:209] 1000582 583983 4488176 16045 13053 ...
## ..$ 1966 : num [1:209] 1058743 602512 4649105 16693 14217 ...
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 209 obs. of 9 variables:
## ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## ..$ 1967 : num [1:209] 1119067 621180 4826104 17349 15440 ...
## ..$ 1968 : num [1:209] 1182159 639964 5017299 17996 16727 ...
## ..$ 1969 : num [1:209] 1248901 658853 5219332 18619 18088 ...
## ..$ 1970 : num [1:209] 1319849 677839 5429743 19206 19529 ...
## ..$ 1971 : num [1:209] 1409001 698932 5619042 19752 20929 ...
## ..$ 1972 : num [1:209] 1502402 720207 5815734 20263 22406 ...
## ..$ 1973 : num [1:209] 1598835 741681 6020647 20742 23937 ...
## ..$ 1974 : num [1:209] 1696445 763385 6235114 21194 25482 ...
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 209 obs. of 38 variables:
## ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## ..$ 1975 : num [1:209] 1793266 785350 6460138 21632 27019 ...
## ..$ 1976 : num [1:209] 1905033 807990 6774099 22047 28366 ...
## ..$ 1977 : num [1:209] 2021308 830959 7102902 22452 29677 ...
## ..$ 1978 : num [1:209] 2142248 854262 7447728 22899 31037 ...
## ..$ 1979 : num [1:209] 2268015 877898 7810073 23457 32572 ...
## ..$ 1980 : num [1:209] 2398775 901884 8190772 24177 34366 ...
## ..$ 1981 : num [1:209] 2493265 927224 8637724 25173 36356 ...
## ..$ 1982 : num [1:209] 2590846 952447 9105820 26342 38618 ...
## ..$ 1983 : num [1:209] 2691612 978476 9591900 27655 40983 ...
## ..$ 1984 : num [1:209] 2795656 1006613 10091289 29062 43207 ...
## ..$ 1985 : num [1:209] 2903078 1037541 10600112 30524 45119 ...
## ..$ 1986 : num [1:209] 3006983 1072365 11101757 32014 46254 ...
## ..$ 1987 : num [1:209] 3113957 1109954 11609104 33548 47019 ...
## ..$ 1988 : num [1:209] 3224082 1146633 12122941 35095 47669 ...
## ..$ 1989 : num [1:209] 3337444 1177286 12645263 36618 48577 ...
## ..$ 1990 : num [1:209] 3454129 1198293 13177079 38088 49982 ...
## ..$ 1991 : num [1:209] 3617842 1215445 13708813 39600 51972 ...
## ..$ 1992 : num [1:209] 3788685 1222544 14248297 41049 54469 ...
## ..$ 1993 : num [1:209] 3966956 1222812 14789176 42443 57079 ...
## ..$ 1994 : num [1:209] 4152960 1221364 15322651 43798 59243 ...
## ..$ 1995 : num [1:209] 4347018 1222234 15842442 45129 60598 ...
## ..$ 1996 : num [1:209] 4531285 1228760 16395553 46343 60927 ...
## ..$ 1997 : num [1:209] 4722603 1238090 16935451 47527 60462 ...
## ..$ 1998 : num [1:209] 4921227 1250366 17469200 48705 59685 ...
## ..$ 1999 : num [1:209] 5127421 1265195 18007937 49906 59281 ...
## ..$ 2000 : num [1:209] 5341456 1282223 18560597 51151 59719 ...
## ..$ 2001 : num [1:209] 5564492 1315690 19198872 52341 61062 ...
## ..$ 2002 : num [1:209] 5795940 1352278 19854835 53583 63212 ...
## ..$ 2003 : num [1:209] 6036100 1391143 20529356 54864 65802 ...
## ..$ 2004 : num [1:209] 6285281 1430918 21222198 56166 68301 ...
## ..$ 2005 : num [1:209] 6543804 1470488 21932978 57474 70329 ...
## ..$ 2006 : num [1:209] 6812538 1512255 22625052 58679 71726 ...
## ..$ 2007 : num [1:209] 7091245 1553491 23335543 59894 72684 ...
## ..$ 2008 : num [1:209] 7380272 1594351 24061749 61118 73335 ...
## ..$ 2009 : num [1:209] 7679982 1635262 24799591 62357 73897 ...
## ..$ 2010 : num [1:209] 7990746 1676545 25545622 63616 74525 ...
## ..$ 2011 : num [1:209] 8316976 1716842 26216968 64817 75207 ...
we will learn how to use both the excel_sheets() and the read_excel() function in combination with lapply() to read multiple sheets at once.
Reading a workbook
In the previous exercise you generated a list of three Excel sheets that you imported. However, loading in every sheet manually and then merging them in a list can be quite tedious. Luckily, you can automate this with lapply().
Have a look at the example code below:
my_workbook <- lapply(excel_sheets("data.xlsx"),
read_excel,
path = "data.xlsx")The read_excel() function is called multiple times on the "data.xlsx" file and each sheet is loaded in one after the other. The result is a list of data frames, each data frame representing one of the sheets in data.xlsx.
# Read all Excel sheets with lapply(): pop_list
pop_list <- lapply(excel_sheets("../xDatasets/urbanpop.xlsx"),
read_excel,
path = "../xDatasets/urbanpop.xlsx")
# Display the structure of pop_list
str(pop_list)## List of 3
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 209 obs. of 8 variables:
## ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## ..$ 1960 : num [1:209] 769308 494443 3293999 NA NA ...
## ..$ 1961 : num [1:209] 814923 511803 3515148 13660 8724 ...
## ..$ 1962 : num [1:209] 858522 529439 3739963 14166 9700 ...
## ..$ 1963 : num [1:209] 903914 547377 3973289 14759 10748 ...
## ..$ 1964 : num [1:209] 951226 565572 4220987 15396 11866 ...
## ..$ 1965 : num [1:209] 1000582 583983 4488176 16045 13053 ...
## ..$ 1966 : num [1:209] 1058743 602512 4649105 16693 14217 ...
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 209 obs. of 9 variables:
## ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## ..$ 1967 : num [1:209] 1119067 621180 4826104 17349 15440 ...
## ..$ 1968 : num [1:209] 1182159 639964 5017299 17996 16727 ...
## ..$ 1969 : num [1:209] 1248901 658853 5219332 18619 18088 ...
## ..$ 1970 : num [1:209] 1319849 677839 5429743 19206 19529 ...
## ..$ 1971 : num [1:209] 1409001 698932 5619042 19752 20929 ...
## ..$ 1972 : num [1:209] 1502402 720207 5815734 20263 22406 ...
## ..$ 1973 : num [1:209] 1598835 741681 6020647 20742 23937 ...
## ..$ 1974 : num [1:209] 1696445 763385 6235114 21194 25482 ...
## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 209 obs. of 38 variables:
## ..$ country: chr [1:209] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## ..$ 1975 : num [1:209] 1793266 785350 6460138 21632 27019 ...
## ..$ 1976 : num [1:209] 1905033 807990 6774099 22047 28366 ...
## ..$ 1977 : num [1:209] 2021308 830959 7102902 22452 29677 ...
## ..$ 1978 : num [1:209] 2142248 854262 7447728 22899 31037 ...
## ..$ 1979 : num [1:209] 2268015 877898 7810073 23457 32572 ...
## ..$ 1980 : num [1:209] 2398775 901884 8190772 24177 34366 ...
## ..$ 1981 : num [1:209] 2493265 927224 8637724 25173 36356 ...
## ..$ 1982 : num [1:209] 2590846 952447 9105820 26342 38618 ...
## ..$ 1983 : num [1:209] 2691612 978476 9591900 27655 40983 ...
## ..$ 1984 : num [1:209] 2795656 1006613 10091289 29062 43207 ...
## ..$ 1985 : num [1:209] 2903078 1037541 10600112 30524 45119 ...
## ..$ 1986 : num [1:209] 3006983 1072365 11101757 32014 46254 ...
## ..$ 1987 : num [1:209] 3113957 1109954 11609104 33548 47019 ...
## ..$ 1988 : num [1:209] 3224082 1146633 12122941 35095 47669 ...
## ..$ 1989 : num [1:209] 3337444 1177286 12645263 36618 48577 ...
## ..$ 1990 : num [1:209] 3454129 1198293 13177079 38088 49982 ...
## ..$ 1991 : num [1:209] 3617842 1215445 13708813 39600 51972 ...
## ..$ 1992 : num [1:209] 3788685 1222544 14248297 41049 54469 ...
## ..$ 1993 : num [1:209] 3966956 1222812 14789176 42443 57079 ...
## ..$ 1994 : num [1:209] 4152960 1221364 15322651 43798 59243 ...
## ..$ 1995 : num [1:209] 4347018 1222234 15842442 45129 60598 ...
## ..$ 1996 : num [1:209] 4531285 1228760 16395553 46343 60927 ...
## ..$ 1997 : num [1:209] 4722603 1238090 16935451 47527 60462 ...
## ..$ 1998 : num [1:209] 4921227 1250366 17469200 48705 59685 ...
## ..$ 1999 : num [1:209] 5127421 1265195 18007937 49906 59281 ...
## ..$ 2000 : num [1:209] 5341456 1282223 18560597 51151 59719 ...
## ..$ 2001 : num [1:209] 5564492 1315690 19198872 52341 61062 ...
## ..$ 2002 : num [1:209] 5795940 1352278 19854835 53583 63212 ...
## ..$ 2003 : num [1:209] 6036100 1391143 20529356 54864 65802 ...
## ..$ 2004 : num [1:209] 6285281 1430918 21222198 56166 68301 ...
## ..$ 2005 : num [1:209] 6543804 1470488 21932978 57474 70329 ...
## ..$ 2006 : num [1:209] 6812538 1512255 22625052 58679 71726 ...
## ..$ 2007 : num [1:209] 7091245 1553491 23335543 59894 72684 ...
## ..$ 2008 : num [1:209] 7380272 1594351 24061749 61118 73335 ...
## ..$ 2009 : num [1:209] 7679982 1635262 24799591 62357 73897 ...
## ..$ 2010 : num [1:209] 7990746 1676545 25545622 63616 74525 ...
## ..$ 2011 : num [1:209] 8316976 1716842 26216968 64817 75207 ...
The col_names argument
Apart from path and sheet, there are several other arguments you can specify in read_excel(). One of these arguments is called col_names.
By default it is TRUE, denoting whether the first row in the Excel sheets contains the column names. If this is not the case, you can set col_names to FALSE. In this case, R will choose column names for you. You can also choose to set col_names to a character vector with names for each column. It works exactly the same as in the readr package.
You’ll be working with the urbanpop_nonames.xlsx file. It contains the same data as urbanpop.xlsx but has no column names in the first row of the excel sheets.
# Import the first Excel sheet of urbanpop_nonames.xlsx (R gives names): pop_a
pop_a <- read_excel("../xDatasets/urbanpop_nonames.xlsx", col_names = FALSE)## New names:
## * `` -> `..1`
## * `` -> `..2`
## * `` -> `..3`
## * `` -> `..4`
## * `` -> `..5`
## * ... and 3 more
# Import the first Excel sheet of urbanpop_nonames.xlsx (specify col_names): pop_b
cols <- c("country", paste0("year_", 1960:1966))
pop_b <- read_excel("../xDatasets/urbanpop_nonames.xlsx", col_names = cols)
# Print the summary of pop_a
sum_pop_a <- as.data.frame(do.call(cbind, lapply(pop_a, summary)))## Warning in (function (..., deparse.level = 1) : number of rows of result is
## not a multiple of vector length (arg 1)
sum_pop_a %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| ..1 | ..2 | ..3 | ..4 | ..5 | ..6 | ..7 | ..8 | |
|---|---|---|---|---|---|---|---|---|
| Min. | 209 | 3378 | 1028.344532 | 1089.933922 | 1153.603196 | 1217.930058 | 1281.49467 | 1348.89911 |
| 1st Qu. | character | 88977.5 | 70644.108696 | 74973.524208 | 81869.73424 | 84952.961648 | 88633.26703 | 93638.353542 |
| Median | character | 580675 | 570158.53128 | 593967.538248 | 619331.07068 | 645261.590826 | 679109.1204 | 735139.3668 |
| Mean | 209 | 4988124.33333333 | 4991613.4620824 | 5141591.79013524 | 5303711.07780988 | 5468965.88001294 | 5637394.23704651 | 5790281.36374468 |
| 3rd Qu. | character | 3077228.5 | 2807280.200284 | 2948396.094116 | 3148940.981158 | 3296444.49587 | 3317422.0327 | 3418036.001856 |
| Max. | character | 126469700 | 129268132.666 | 131974142.696 | 134599886.436 | 137205240.336 | 139663053.37 | 141962708.16 |
| NA’s | 209 | 11 | 1028.344532 | 1089.933922 | 1153.603196 | 1217.930058 | 1281.49467 | 1348.89911 |
# Print the summary of pop_b
sum_pop_b <- as.data.frame(do.call(cbind, lapply(pop_b, summary))) ## Warning in (function (..., deparse.level = 1) : number of rows of result is
## not a multiple of vector length (arg 1)
sum_pop_b %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| country | year_1960 | year_1961 | year_1962 | year_1963 | year_1964 | year_1965 | year_1966 | |
|---|---|---|---|---|---|---|---|---|
| Min. | 209 | 3378 | 1028.344532 | 1089.933922 | 1153.603196 | 1217.930058 | 1281.49467 | 1348.89911 |
| 1st Qu. | character | 88977.5 | 70644.108696 | 74973.524208 | 81869.73424 | 84952.961648 | 88633.26703 | 93638.353542 |
| Median | character | 580675 | 570158.53128 | 593967.538248 | 619331.07068 | 645261.590826 | 679109.1204 | 735139.3668 |
| Mean | 209 | 4988124.33333333 | 4991613.4620824 | 5141591.79013524 | 5303711.07780988 | 5468965.88001294 | 5637394.23704651 | 5790281.36374468 |
| 3rd Qu. | character | 3077228.5 | 2807280.200284 | 2948396.094116 | 3148940.981158 | 3296444.49587 | 3317422.0327 | 3418036.001856 |
| Max. | character | 126469700 | 129268132.666 | 131974142.696 | 134599886.436 | 137205240.336 | 139663053.37 | 141962708.16 |
| NA’s | 209 | 11 | 1028.344532 | 1089.933922 | 1153.603196 | 1217.930058 | 1281.49467 | 1348.89911 |
Did you spot the difference between the summaries? It’s really crucial to correctly tell R whether your Excel data contains column names. If you don’t, the head of the data frame you end up with will contain incorrect information.
The skip argument
Another argument that can be very useful when reading in Excel files that are less tidy, is skip. With skip, you can tell R to ignore a specified number of rows inside the Excel sheets you’re trying to pull data from. Have a look at this example:
read_excel("data.xlsx", skip = 15)
In this case, the first 15 rows in the first sheet of "data.xlsx" are ignored.
If the first row of this sheet contained the column names, this information will also be ignored by readxl. Make sure to set col_names to FALSE or manually specify column names in this case!
# Import the second sheet of urbanpop.xlsx, skipping the first 21 rows: urbanpop_sel
urbanpop_sel <- read_excel("../xDatasets/urbanpop.xlsx", sheet = 2, col_names = FALSE, skip = 21)## New names:
## * `` -> `..1`
## * `` -> `..2`
## * `` -> `..3`
## * `` -> `..4`
## * `` -> `..5`
## * ... and 4 more
# Print out the first observation from urbanpop_sel
urbanpop_sel[1,]## # A tibble: 1 x 9
## ..1 ..2 ..3 ..4 ..5 ..6 ..7 ..8 ..9
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Benin 382022. 411859. 443013. 475611. 515820. 557938. 602093. 648410.
Time to learn about another package to import data from Excel: gdata.
gdata package
Import a local file
In this part of the chapter you’ll learn how to import .xls files using the gdata package. Similar to the readxl package, you can import single Excel sheets from Excel sheets to start your analysis in R.
You’ll be working with the urbanpop.xls dataset, the .xls version of the Excel file you’ve been working with before.
# Load the gdata package
library(gdata)
prl <- "C:/myperl/perl/bin/perl5.28.1.exe"
# Import the second sheet of urbanpop.xls: urban_pop
urban_pop <- read.xls("../xDatasets/urbanpop.xls", perl = prl)
# Print the first 11 observations using head()
urban_pop %>%
head(11) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| country | X1960 | X1961 | X1962 | X1963 | X1964 | X1965 | X1966 |
|---|---|---|---|---|---|---|---|
| Afghanistan | 769308 | 814923.049 | 858521.698 | 903913.86 | 951225.94 | 1000582.35 | 1058743.47 |
| Albania | 494443 | 511802.780 | 529438.851 | 547376.75 | 565571.75 | 583982.89 | 602512.17 |
| Algeria | 3293999 | 3515147.548 | 3739963.007 | 3973289.13 | 4220987.01 | 4488175.64 | 4649105.24 |
| American Samoa | NA | 13660.298 | 14165.797 | 14758.93 | 15396.42 | 16044.82 | 16693.11 |
| Andorra | NA | 8723.921 | 9700.346 | 10748.38 | 11865.86 | 13052.75 | 14216.81 |
| Angola | 521205 | 548265.046 | 579695.370 | 612086.70 | 645261.59 | 679109.12 | 717833.40 |
| Antigua and Barbuda | 21699 | 21635.051 | 21664.200 | 21740.74 | 21830.18 | 21908.89 | 22003.13 |
| Argentina | 15224096 | 15545222.586 | 15912120.018 | 16282345.35 | 16654412.49 | 17027711.84 | 17389812.09 |
| Armenia | 957974 | 1008597.321 | 1061426.399 | 1115612.32 | 1170683.41 | 1226270.42 | 1281581.61 |
| Aruba | 24996 | 28139.757 | 28532.729 | 28763.12 | 28923.39 | 29082.53 | 29252.23 |
| Australia | 8375329 | 8587694.566 | 8841890.588 | 9055934.70 | 9279084.65 | 9507271.80 | 9768314.91 |
There seems to be a lot of missing data, but read.xls() knows how to handle it.
read.xls() wraps around read.table()
Remember how read.xls() actually works? It basically comes down to two steps: converting the Excel file to a .csv file using a Perl script, and then reading that .csv file with the read.csv() function that is loaded by default in R, through the utils package.
This means that all the options that you can specify in read.csv(), can also be specified in read.xls().
# Column names for urban_pop
columns <- c("country", paste0("year_", 1967:1974))
# Finish the read.xls call
urban_pop <- read.xls("../xDatasets/urbanpop.xls", sheet = 2,
skip = 50, header = FALSE, stringsAsFactors = FALSE,
col.names = columns,
perl = prl)
# Print first 10 observation of urban_pop
urban_pop[1:10, ] %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| country | year_1967 | year_1968 | year_1969 | year_1970 | year_1971 | year_1972 | year_1973 | year_1974 |
|---|---|---|---|---|---|---|---|---|
| Cyprus | 231929.74 | 237831.38 | 243983.34 | 250164.52 | 261213.21 | 272407.99 | 283774.90 | 295379.83 |
| Czech Republic | 6204409.91 | 6266304.50 | 6326368.97 | 6348794.89 | 6437055.17 | 6572632.32 | 6718465.53 | 6873458.18 |
| Denmark | 3777552.62 | 3826785.08 | 3874313.99 | 3930042.97 | 3981360.12 | 4028247.92 | 4076867.28 | 4120201.43 |
| Djibouti | 77788.04 | 84694.35 | 92045.77 | 99845.22 | 107799.69 | 116098.23 | 125391.58 | 136606.25 |
| Dominica | 27550.36 | 29527.32 | 31475.62 | 33328.25 | 34761.52 | 36049.99 | 37260.05 | 38501.47 |
| Dominican Republic | 1535485.43 | 1625455.76 | 1718315.40 | 1814060.00 | 1915590.38 | 2020157.01 | 2127714.45 | 2238203.87 |
| Ecuador | 2059355.12 | 2151395.14 | 2246890.79 | 2345864.41 | 2453817.78 | 2565644.81 | 2681525.25 | 2801692.62 |
| Egypt | 13798171.00 | 14248342.19 | 14703858.22 | 15162858.52 | 15603661.36 | 16047814.69 | 16498633.27 | 16960827.93 |
| El Salvador | 1345528.98 | 1387218.33 | 1429378.98 | 1472181.26 | 1527985.34 | 1584758.18 | 1642098.95 | 1699470.87 |
| Equatorial Guinea | 75364.50 | 77295.03 | 78445.74 | 78411.07 | 77055.29 | 74596.06 | 71438.96 | 68179.26 |
Work that Excel data!
Now that you can read in Excel data, let’s try to clean and merge it. You already used the cbind() function some exercises ago. Let’s take it one step further now.
Make sure the first column of urban_sheet2 and urban_sheet3 are removed, so you don’t have duplicate columns.
# Add code to import data from all three sheets in urbanpop.xls
path <- "../xDatasets/urbanpop.xls"
urban_sheet1 <- read.xls(path, sheet = 1, stringsAsFactors = FALSE, perl = prl)
urban_sheet2 <- read.xls(path, sheet = 2, stringsAsFactors = FALSE, perl = prl)
urban_sheet3 <- read.xls(path, sheet = 3, stringsAsFactors = FALSE, perl = prl)
# Extend the cbind() call to include urban_sheet3: urban
urban <- cbind(urban_sheet1, urban_sheet2[-1], urban_sheet3[-1])
# Remove all rows with NAs from urban: urban_clean
urban_clean <- na.omit(urban)
# Print out a summary of urban_clean
sum_urban_clean <- as.data.frame(do.call(cbind, lapply(urban_clean, summary)))
sum_urban_clean %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| country | X1960 | X1961 | X1962 | X1963 | X1964 | X1965 | X1966 | X1967 | X1968 | X1969 | X1970 | X1971 | X1972 | X1973 | X1974 | X1975 | X1976 | X1977 | X1978 | X1979 | X1980 | X1981 | X1982 | X1983 | X1984 | X1985 | X1986 | X1987 | X1988 | X1989 | X1990 | X1991 | X1992 | X1993 | X1994 | X1995 | X1996 | X1997 | X1998 | X1999 | X2000 | X2001 | X2002 | X2003 | X2004 | X2005 | X2006 | X2007 | X2008 | X2009 | X2010 | X2011 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | 197 | 3378 | 3432.538728 | 3481.185172 | 3531.63859 | 3585.76687 | 3644.38707 | 3706.277064 | 3771.200916 | 3835.22167 | 3893.072722 | 3941.19174 | 4016.945328 | 4084.374744 | 4146.086628 | 4206.141156 | 4267.29312 | 4334.0081 | 4401.74592 | 4470.32184 | 4539.003 | 4607.42362 | 4644.608 | 4681.3188 | 4716.47137 | 4750.0775 | 4781.7966 | 4809.142888 | 4834.770028 | 4859.038692 | 4882.810482 | 4906.75008 | 4945.55556 | 4984.68096 | 5023.777016 | 5062.335888 | 5099.69562 | 5078.770048 | 5054.95452 | 5028.75972 | 5000.50988 | 4970.6589 | 5002.535978 | 5033.946704 | 5063.697436 | 5089.73142 | 5111.06776 | 5135.30628 | 5155.113348 | 5172.283852 | 5188.585512 | 5205.90336 | 5232.931168 |
| 1st Qu. | character | 87735 | 92904.606756 | 98330.948736 | 104988.264446 | 112084.457352 | 119321.97 | 128565.463 | 138024.19245 | 147845.8773 | 158252.45345 | 171063.02556 | 181482.89978 | 189492.09236 | 197792.41178 | 205410.218984 | 211745.77114 | 216991.27989 | 222209.073796 | 227604.825726 | 233461.342926 | 242583.3337 | 248948.025406 | 257944.331886 | 274139.40162 | 284939.32716 | 300927.86451 | 307698.837706 | 321125.123018 | 334616.166066 | 347348.192094 | 370151.77112 | 394611.435736 | 418787.854768 | 427457.223172 | 435958.850304 | 461992.88564 | 488136.271732 | 494203.061056 | 498001.634676 | 505143.58818 | 525628.65362 | 550637.602854 | 567530.80956 | 572094.270328 | 593900.32896 | 620510.59287 | 632658.612652 | 645171.676586 | 658016.598168 | 671085.46426 | 684302.43284 | 698008.7312 |
| Median | character | 599714 | 630787.860064 | 659463.754092 | 704989.332 | 740609.01508 | 774957.04434 | 809767.5527 | 838449.4164 | 890269.65478 | 929449.58876 | 976470.6 | 1008630.17884 | 1048737.86094 | 1097292.7 | 1159401.941992 | 1223145.5664 | 1249829.494308 | 1311276.14376 | 1340810.8726 | 1448185.064196 | 1592396.63611 | 1673078.53086 | 1713060.2548 | 1730625.51714 | 1749032.62549 | 1786125.21997 | 1850910.32 | 1953693.62 | 1997010.506664 | 1993543.888182 | 2066505.1884 | 2150229.608434 | 2237404.79102 | 2322157.565002 | 2410297.27952 | 2482392.9936 | 2522459.905056 | 2606124.829448 | 2664983.044388 | 2737808.865544 | 2826647.08158 | 2925851.195664 | 2928251.802 | 2944934.454 | 2994356.123856 | 3057922.97631 | 3269963.03326 | 3432024.00765 | 3589394.7688 | 3652338.21372 | 3676309.32994 | 3664663.54878 |
| Mean | 197 | 5012387.87309645 | 5282487.61712411 | 5440971.99436048 | 5612311.82411641 | 5786960.83325094 | 5964969.50655066 | 6126413.05052034 | 6288770.71247333 | 6451366.61260926 | 6624909.36219931 | 6799109.86466731 | 6980894.9171615 | 7165337.95755494 | 7349454.48592461 | 7540446.08390846 | 7731972.67019076 | 7936401.03589518 | 8145944.80521005 | 8361360.04357668 | 8583137.52307083 | 8808772.39480757 | 9049162.89768697 | 9295226.04183367 | 9545035.25760904 | 9798558.74241767 | 10058661.1780371 | 10323838.5270233 | 10595816.5773639 | 10873040.5126198 | 11154457.6425615 | 11438542.7407744 | 11725075.8943269 | 12010922.414318 | 12296948.8773616 | 12582930.152931 | 12871479.9626007 | 13165923.5120549 | 13463675.1956097 | 13762860.8232182 | 14063387.4536247 | 14369278.3499752 | 14705743.0676065 | 15043381.4434264 | 15384512.7213773 | 15730299.352009 | 16080261.7122772 | 16435871.5953336 | 16797483.6272271 | 17164897.8227208 | 17533996.7915159 | 17904811.0932721 | 18276297.2851597 |
| 3rd Qu. | character | 3130085 | 3155370.253552 | 3250211.15652 | 3416489.86485 | 3585464.2416 | 3666723.56576 | 3871757.260564 | 4019905.62298 | 4158186.16246 | 4300668.64452 | 4440047.202 | 4595966.14701 | 4766544.542256 | 4838296.681788 | 4906384.412208 | 5003370.43087 | 5121117.800904 | 5227677.348276 | 5352746.487088 | 5558849.899624 | 5815772.45784 | 6070456.957982 | 6337995.102416 | 6619986.67603 | 6918260.596384 | 6931779.72735 | 6935762.653904 | 6939904.820202 | 6945022.351696 | 6885378.320786 | 6830026.21944 | 6816589.3548 | 6820099.44 | 7139656.049106 | 7499901.168492 | 7708571.45679 | 7686092.37392 | 7664315.67911 | 7784055.519752 | 8083487.62116 | 8305563.75698 | 8421967.008 | 8448628.104 | 8622731.607028 | 8999112.337684 | 9394001.32608 | 9689807.3712 | 9803380.896 | 10210317.370344 | 10518288.968376 | 10618596.2296 | 10731192.832 |
| Max. | character | 126469700 | 129268132.666 | 131974142.696 | 134599886.436 | 137205240.336 | 139663053.37 | 141962708.16 | 144201721.584 | 146340364.368 | 148475900.598 | 150922373.04 | 152863830.642 | 154530472.704 | 156034106.334 | 157488073.512 | 159452730 | 165583752.46 | 171550309.56 | 177605736.42 | 183785364.32 | 189947471.3 | 199385257.62 | 209435967.72 | 219680097.56 | 229872397.1 | 240414889.6 | 251630158.04 | 263433513.42 | 275570541.24 | 287810746.6 | 300165617.7 | 314689997.24 | 329099365.12 | 343555326.96 | 358232229.62 | 373035156.55 | 388936607.1 | 405031715.55 | 421147610.11 | 437126845.43 | 452999146.65 | 473204511 | 493402140 | 513607776 | 533892174.75 | 554367818.4 | 575050080.56 | 595731463.86 | 616552721.82 | 637533975.76 | 658557734.5 | 678796403.04 |
Time for something totally different: XLConnect